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Abstract. Many discrete mathematics problems in phylogenetics are defined 
in terms of the relative labeling of pairs of leaf-labeled trees. These relative 
labelings are naturally formalized as tanglegrams, which have previously been 
an object of study in coevolutionary analysis. Although there has been con¬ 
siderable work on planar drawings of tanglegrams, they have not been fully 
explored as combinatorial objects until recently. In this paper, we describe how 
many discrete mathematical questions on trees “factor” through a problem on 
tanglegrams, and how understanding that factoring can simplify analysis. De¬ 
pending on the problem, it may be useful to consider a unordered version 
of tanglegrams, and/or their unrooted counterparts. For all of these defini¬ 
tions, we show how the isomorphism types of tanglegrams can be understood 
in terms of double cosets of the symmetric group, and we investigate their 
automorphisms. Understanding tanglegrams better will isolate the distinct 
problems on leaf-labeled pairs of trees and reveal natural symmetries of spaces 
associated with such problems. 


1. Introduction 

Consider the problem of computing the subtree-prune-regraft (SPR) distance 
between two leaf-labeled phylogenetic trees. An SPR move cuts one edge of the 
tree and then reattaches the resulting rooted subtree at another edge (Figure [^. 
The SPR distance between two (phylogenetic, meaning leaf-labeled) trees Ti and 
T2 is the minimum number of SPR moves required to transform Ti into T2. This 
distance is of fundamental importance in phylogenetics, and many papers have been 
written both applying and investigating properties of this distance. 

Say that we wanted to calculate the SPR distance between every pair of trees on 
a certain number of leaves. Naively this would require a large number of SPR calcu¬ 
lations, namely the number of leaf-labeled phylogenetic trees choose two. However, 
the distance between two such trees does not depend on the actual labels of Ti and 
T2, so one can permute the leaf labels without changing the distance. Furthermore, 
a path made by intermediate trees between the two trees could also have its labels 
permuted in order to give a path between the trees with permuted leaf labels. Thus, 
problems like SPR distance do not concern the actual leaf labels as such, but rather 
use the leaf labels as markers that can be used to map leaves of one phylogenetic 
tree on to another: the problem and its solutions are actually defined in terms of a 
relative leaf labeling (Figure [^. 
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Figure 1. Two equivalent subtree-prune-regraft moves applied 
to trees which are identical up to relabeling. The number of such 
moves required to transform one tree into another only depends on 
the relative leaf labeling between the two trees. 


Analogous discrete mathematics problems and objects defined in terms of tuples 
of labeled combinatorial objects, but without direct reference to the labels them¬ 
selves, are ubiquitous in computational biology. Any distance between pairs of 
trees that is computed in terms of tree modifications, such as (rooted or unrooted) 
subtree-prune-regraft described above, nearest-neighbor-interchange and tree bisec¬ 
tion and reattachment (see for a review), satisfy this condition. Such moves 
are used as the basis of both maximum-likelihood heuristic search and Bayesian 
Markov chain Monte Carlo (MCMC) tree reconstruction. The corresponding graph, 
in which trees form vertices and a collection of moves form edges, has natural sym¬ 
metries of pairs of points in these spaces which have the same relative labeling. For 
example, hitting times of simple random walks on graphs formed by such moves for 
given start and end trees [6j|^ are defined in terms of relative labelings between 
the start and end trees. The same is true for more complex random walks such as 
Markov chain Monte Carlo using a label-invariant likelihood, as would be used for 
sampling from a prior distribution on trees . Graph characteristics such as Ricci- 
Ollivier curvature 10 under simple random walks or MCMC with a label-invariant 
likelihood are expressed in terms of relative tree labelings . Analogous consider¬ 
ations hold for the problem of species delimitation, which can naturally be phrased 
in terms of inference of a partition of relatively labeled objects: neither distances 


between partitions 12 nor the graphs underlying MCMC over these partitions 13 


actually refer to labels themselves. 

The concept of a pair of rooted phylogenetic trees with a relative leaf labeling 
has been formalized as a tanglegram 14 ,15. A tanglegram is a pair of trees on 


the same set of leaves with a bijection between the leaves in the two trees 16 
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Figure 2. The tanglegram corresponding to the pairs of trees in 
Figure with the bijection shown in gray. When considered as a 
graph, the black edges are called tree edges, and the gray edges are 
called between-leaf edges. 


(Figure]^. There has been extensive work on the problem of finding the layout 
of a given tanglegram in the plane that minimizes crossings, with the goal of most 


clearly visualizing co-evolutionary relationships between species 16 ■ 21 


However, we are not aware of any work considering tanglegrams as a convenient 
formalization of the notion of a relative leaf labeling in the context of studying 
pairs of labeled phylogenetic trees. There has also been little work enumerating 
or finding other properties of tanglegrams until recently 22 . In addition, more 


challenging and important problems in mathematical phylogenetics reduce to ques¬ 
tions on relatively-labeled collections of more than two trees, and correspondingly 
one can extend the notion of tanglegram to more than two trees. For example, 
“supertree” methods reconstruct a tree from collections of trees, each of which is 
typically considered to express information about the larger tree [2p3p4l , which in 
fact is a problem on multi-tree tanglegrams. The same is true for the minimal hy¬ 
bridization network 25 and maximum agreement subtree 26 27 problems. Thus 
many problems in the discrete mathematics of phylogenetic trees “factor” through 
a problem concerning a generalized version of a tanglegram. 

With this motivation for studying tanglegrams in more depth, here we formalize 
more general notions of tanglegram, describe their symmetries, observe that tan¬ 
glegrams have a convenient algebraic formulation as double cosets of the symmetric 
group, and provide some enumeration results for four types of tanglegram. 


2. Tanglegrams 

An unrooted binary tree T is a finite graph for which there is a unique path 
between every pair of vertices, and such that every non-leaf vertex has degree three. 
A rooted tree is an unrooted tree with a distinguished node called the root. We will 
also make the assumption common in phylogenetics that the root of a rooted tree 
has degree two, and that there are no degree-two nodes other than the root (if there 
is a root). The leaves L{T) of a tree T are degree-one vertices of the tree. 

Definition 1. Let T and S be trees with the same number of leaves. An ordered 
tanglegram Y on (T, S) is an ordered triple (T, (f, S), where 4> is a bijection LiT) 
L{S). 

The graph of the tanglegram Y is the graph formed from the union of T and S 
by adding an edge from each leaf a; in T to the corresponding leaf ^(x) in S. We 
will distinguish these between-leaf edges from the tree edges of T and S (Figure]^. 
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We have defined tanglegrams in terms of ordered triples Y = (T, (j), S) , so F' = 
(S', (j)~^,T) is a different tanglegram. This is a sensible definition when considering 
sequences of trees with an inherent directionality. However, often there is not such 
a directionality, such as for subtree-prune-regraft moves, which are easily reversed. 
This motivates the following concept: 

Definition 2. A unordered tanglegram is a pair {{T, S},(f>) where {T, S} is an 
unordered set of two trees, and (f is a bisection between L{T) and L{S). 

2.1. Automorphisms and tanglegram equivalence. Let V{X) denote the ver¬ 
tex set of a graph X. An isomorphism between unrooted trees T and S is a bijective 
map h : V(T) V{S) in which / maps edges of T to edges of S. For a rooted tree, 
we add the requirement that an isomorphism must map the root node of T to the 
root node of S. An automorphism of a tree T is an isomorphism of T with itself. It 
is clear that the degree of a node (i.e. the number of adjacent nodes) is preserved 
under isomorphisms. In phylogenetics, it is common that the root of a tree is the 
only node of degree two. In this case, there is no distinction between isomorphisms 
of rooted trees and isomorphisms of these trees as unrooted trees because degrees 
are preserved under isomorphism. 

We start with an “obvious” lemma, the proof of which can be found in the 
Appendix. First note that any isomorphism between trees T and S preserves the 
leaf sets L{T) and L{S), and therefore induces a bijection between L{T) and L{S). 

Lemma 3. An isomorphism between (rooted or unrooted) trees T and S is uniquely 
determined by the induced bijection between L{T) and L(S). In particular, an 
automorphism of a tree T is uniquely determined by the induced permutation of the 
leaf set L{T). □ 

Thus we will often consider an isomorphism as such a bijection L{T) —>■ L{S). 

Definition 4. Given two tanglegrams Y = (T, (j), S) and Y' = (T, </)', S) on the same 
pair of trees, an isomorphism of Y and Y' is defined by a pair of automorphisms 
g : L{T) L(T), and h : L{S) —>■ L{S) satisfying h o (p = f o g. 

The condition in the definition can be visualized in the commutative diagram 

L{T) L{S) 

L(T) L[S). 

Note that if two tanglegrams Yi and Y 2 are isomorphic, then there is a 1-1 
map from the graph of Yi to the graph of Y 2 which maps between-leaf edges to 
between-leaf edges. 

2.2. Symmetries of trees. In order to describe the ensemble of tanglegrams it 
is necessary to review the symmetries of the trees in the tanglegram. Although 
this material is classical, we were not able to find a simple presentation, and so 
provide one here. We will assume familiarity with the basics of group theory (cov¬ 
ered by dozens of textbooks, e.g. [^). Automorphisms of a tree T form a group 
under composition. Using ©„ to denote the symmetric group on n objects, leaf 
automorphisms of T form a subgroup A{T) of 
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To enumerate symmetries of trees it is convenient to use the notion of a wreath 
product] we will only define and use wreath product in the case when the acting 
group is 6fc. Use to denote the fc-fold direct product G x ■ ■ ■ x G. 

Given a group G, the wreath product Gl&k of G by &k can be described as the 
direct product G^ x with the following group operation. First recall that the 
group operation on G^ is defined by applying G’s group operation component-wise. 
An element of &k acts on G^ by permuting the components, such that the group 
action ot a £ &k on g G G^ is the element a{g) £ G^ with zth component gcr{i)- 
Given elements g,g' in G* and a, a' £ &k, the wreath group law is: 


{g,a) {g',cr') := {g a{g'),aa'). 


For rooted trees, Jordan and Polya observed that the automorphism 
group of any rooted tree can be built by repeated direct products and wreath 
products of symmetric groups as follows. In the simplest case, assume a rooted 
tree T for which the root has two daughter subtrees Ti and T 2 . If Ti and T 2 
are isomorphic (and thus have the same automorphism groups), the automorphism 
group of T is the wreath product A(Ti) I © 2 . That is, its symmetry group is 
two copies of A{Ti) along with the symmetry exchanging Ti and T 2 , equipped 
with the group operation that appropriately exchanges the subtrees before applying 
symmetries to the subtrees. If Ti and T 2 are not isomorphic, then A{T) is simply 
the direct product A{Ti) x A{T 2 ). 

Now let T be a tree whose root has some number of daughters, each of which 
are roots of subtrees Ti,... ,Tr. We can reorder and partition the subtrees into N 
partitions: 


Ti,... ,Tii, Ti 


il + l-) • • ■ 5 -^225 


7ijv-i+ij ■■■ ,Ti, 


such that the subtrees in each partition are isomorphic to one another and the 
subtrees in different partitions are not isomorphic. This defines integers ii,... An] 
take io to be zero. A more general version of the argument above establishes 


Theorem 5 (Jordan, 1869). A{T) is the direct product Ai x ■ ■ ■ x Am, where Aj 
is the wreath product of A(Ti^) with the symmetric group &i.-i._^. □ 

This defines the automorphism group of a rooted tree recursively, where of course 
the automorphism group of a single leaf is trivial. 


Example 6. Let Tn denote the perfectly balanced binary tree on 2" leaves and let 
Gn = A(Tn). G 2 = ©2 o,nd for each n, G„ = G„_i I © 2 - Moreover, |G„| = 
2|G„_ip. 

Example 7. The symmetry group of the Newtek-format fgJj / tree (1, ((2,3), ((4,5), 6))); 
(shown as the upper-left tree of Figure is the direct product of the symmetry 
groups 0/(2, 3) and ((4, 5), 6). Each of these symmetry groups are © 2 - 

The automorphism group of an unrooted tree will become clear after we describe 
a classical and mathematically natural way to root an unrooted tree: at the centroid. 
Let T be a tree, and let x be a node of T. If we remove x as well as the edges 
attached to x from T, we obtain a number of disioint connected and rooted subtrees, 

Ai,...,Afc. 

Definition 8. The weight ofx, w{x), is defined as the maximum number of nodes 
of the subtrees Xi,, Xk. 
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Definition 9. The node x is said to be a centroid ofT if w{x) is minimal over all 
nodes of T. 

It is clear that any automorphism of T maps a centroid to a centroid, a fact 
which we will use to find a root fixed under leaf automorphism. Centroids are 
unique or nearly so, as shown by the following theorem, the proof of which can be 
found as a guided exercise in §2.3.4.4], 

Theorem 10 (Jordan, 1869). Every tree has either: 

1. a unique centroid or 

2. two adjacent centroids. 

In case 2, every automorphism either preserves the centroids or exchanges them. □ 

Let T be an unrooted tree, and let be the rooted tree formed by rooting T at 
either the unique centroid, or by a new node in the edge joining a pair of centroids. 

Corollary 11. The automorphism group of an unrooted tree T is identical to the 
automorphism group of the associated rooted tree T^. □ 

Example 12. The symmetry group of the six-leaf unrooted tree with three two-leaf 
subtrees (Newick format ((1,2),(3,4),(5,6));) is ©2 I ©a- 

2.3. Double cosets and enumeration of tanglegrams. We are now ready to 
algebraically describe the set of tanglegrams on a pair of n-leaf trees. Assume n- 
leaf trees T and S, which are both rooted or both unrooted. Arbitrarily mark the 
elements of the leaf sets L{T) and L{S) with the same set of n symbols, such that 
we can identify both A{T) and A(5') as subgroups of ©„. Using this same marking, 
we can also think of the bijections from L{T) to L{S) as being elements of ©„, thus 
these elements of ©„ give tanglegrams on T and S. Recall Definition stating 
that the set of bijections </>' giving the same tanglegram as a given (f are those for 
which there exist automorphisms g S A(T) and h S A(5') such that h o cj) = cj)' o g. 
This criterion is equivalent to (j)' = h(l)g~^ as group elements in ©„. The set of 
elements satisfying such a criterion is called a double coset . 

Definition 13. Given a subgroup J of a group G and g G G, the right coset Jg 
(resp. left coset gJ) G is the set of elements of the form {jg \ j G J} (resp. 
{9) I j G J})- The number of right cosets of J in G is equal to the number of left 
cosets. This number is defined as the index of J in G and is denoted [G : J]. Given 
two subgroups J and K of G, the double coset JgK for some g G G is the set of 
elements {jgk \ j G J,k G K{. 

Any two right (left) cosets of J in G are either identical or disjoint and the 
number of elements in any coset is the same, i.e. |J|. In contrast to single cosets 
(left or right), the number of elements in a double coset may vary. We state 
these observations, and the equivalent observations in the unordered case, as a 
proposition. 

Proposition 14. Given two trees T and S with n leaves, 

• the set of tanglegrams isomorphic to a tanglegram {T,w,S) is in 1-1 corre¬ 
spondence with the double coset A{S)wA{T) of &„■ 

• the set of unordered tanglegrams isomorphic to {{T, S},w) is in T1 corre¬ 
spondence with equivalence classes of double cosets A{S)wA{T) where pairs 
of cosets HwK and Kw~^H are deemed equivalent. 
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Figure 3. The two unrooted binary tanglegrams with four leaves. 


□ 


Note that the actual 1-1 correspondence depends on the marking of T and S. 


Here are some useful facts concerning cosets 28 33 


• any two cosets are either disjoint or identical 

• every double coset is a disjoint union of right cosets and a disjoint union of 
left cosets 

• the number of right cosets of H in HgK is the index [K : K D g~^Hg], and 
the number of left cosets of K in HgK is the index \H ■. H r\ gKg~^]. 

Combining these facts with the proposition above, we get: 


Proposition 15. The number of bijections from L{T) to L{S) giving an ordered 
tanglegram isomorphic toY = {T,w,S) is equal to |H(S')|[H(T) : H(T)nw“^H(5')rc], 
or equivalently |H(T)|[yl(S') : H(5') n □ 


Example 16. Let T and S be the unique binary unrooted tree with 4 leaves. There 
are two distinct tanglegrams on (T, S) in both the ordered and unordered cases 
(Figure^. The automorphism group of either tree, A{T), is the wreath product of 
©2 by © 2 , thus of order 8 (set theoretically Z 2 x Z 2 x ^ 2 ). Marking the leaves with 
the integers 1 through 4 such that (1,2) and (3,4) are both sister pairs, G = A(T) 
is generated by {(12), (34), (13)(24)} C © 4 . 

The symmetric group ©4 contains 4! = 24 elements. Every double coset is a 
disjoint union of single cosets, and G contains 8 elements, therefore the number 
of elements in a double coset is a multiple of 8. Moreover, since the double cosets 
partition © 4 , we either have 3 double cosets (each of 8 elements), or 2 double cosets 
(one of 8 elements and one of 16 elements), or one coset (of 24 elements). Taking 
w = (23), we calculate: 

G n wGw-^ = {(), (12)(34), (13)(24), (14)(23)}. 

Using the properties of double cosets, we find that the number of single cosets in 
the double coset GwG is the index [G : G n w~^Gw] = 2. Thus this double coset 
has 16 elements, and so there must be two double cosets, corresponding to the two 
tanglegrams. 

2.4. Symmetries of tanglegrams. 

Definition 17. An automorphism of an ordered tanglegram Y is an automorphism 
of the graph ofY which maps each tree to itself. An automorphism of an unordered 
tanglegram Y is an automorphism of the graph of Y which preserves the between- 
leaf edges, so an automorphism of an unordered tanglegram either maps each tree to 
itself or switches the two trees. IfY is a rooted tanglegram, then an automorphism 
of Y is required to preserve the roots of the two trees. 
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Figure 4. An ordered rooted and an unordered unrooted tan- 
glegram formed by two copies of the same unrooted tree with no 
automorphism that switches the trees forming each tanglegram. 

These examples show that the second condition of Proposition 
is not always satisfied. 

If the automorphism f : Y ^ Y exchanges the two trees, / is described by a 
pair of isomorphisms: gi : T ^ S and g 2 ■ S ^ T. For any leaf x of T, the image 
of a bijective pair (x^4>(x)) must map to another bijective pair {g 2 { 4 >{x)),gi{x)). 
This implies that gi{x) = (j){g2{4>{x))), and thus in general that gi = (j) ° 92 ° 4>- If 
we put the same set of distinguishing marks on the leaves of the trees T and S, we 
may consider the bijection (j) to be an element of the symmetric group ©„. With 
these conventions, we have shown that there exist gi G A(T) and 52 G A(T) such 
that gi = 4> g 2 4> group elements when there is an automorphism that switches 
the two trees. The converse follows from reversing this argument. In summary: 

Proposition 18. If Y is an unordered tanglegram, then there exists an automor¬ 
phism of Y that switches the two trees if and only if: 

• the trees T and S are isomorphic, and 

• (j)AiT)(fnA{T)^il}. 

□ 

On the other hand, ii h : Y —> F is an automorphism which maps each tree 
to itself, then / is described by two automorphisms g : T ^ T and h : S —>■ S 
satisfying cf o g = h o cf when restricted to the leaves, or g = (j)~^h(j) as elements of 
the symmetric group. 

Proposition 19. Assume an ordered tanglegram Y = {T,(j),S), or an unordered 
tanglegram ({T, (f). Set H = A{T) n (j)~^A{T)(j). 

1. IfY is ordered or T is not isomorphic to S, then AiY) = H. 

2. If Y is unordered and T is isomorphic to S, then, 

a. if A{T) n 4>A[T)(f ^ 0, then A(Y) contains H as a subgroup of index 2. 

b. otherwise, A(Y) = H. 

□ 


Similar to the case for trees, tanglegram automorphisms are determined entirely 
by their action on the leaves of one of the trees. 

2.5. Labeled tanglegrams. Analogous to the concept of a leaf-labeled tree, there 
is a concept of a labeled tanglegram. 

Definition 20. A labeled tanglegram is a tanglegram along with a bijective map 
of the label set X to the leaves of one of the trees. 
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This is analogous to the definition of a leaf-labeled phylogenetic tree 34 . The 


other tree can be considered to be labeled by the composition of the labeling with 
the bijection. Applying this labeling to both trees and then forgetting the bijection 
gives a pair of leaf labeled trees on the same label set, and each such pair of leaf 
labeled trees obviously determines a labeled tanglegram. Thus, labeled tanglegrams 
are in one-to-one correspondence with pairs of leaf-labeled phylogenetic trees. If 
the tanglegram is ordered, then this is an ordered pair of trees, and if unordered it 
is unordered. 

It is natural to ask how many distinct labeled n-tanglegrams have the same un¬ 
derlying ordered or unordered tanglegram. Each leaf has a distinct label, such that 
the symmetric group acts freely on these labels. By the orbit-stabilizer theorem. 

Proposition 21. The number of leaf-distinct labelings of a given n-tanglegram Y 
is equal to n\/\A{Y)\. □ 

This is true for ordered and unordered tanglegrams, using their respective au¬ 
tomorphism definitions. For example, there are 12 labelings for the ordered tan¬ 
glegram (1,(2,(3,4))); (((1,2),3),4); but only 6 when considered as an unordered 
tanglegram. 

Given a means of sampling uniformly from tanglegrams 


22 , we can use this 


proposition to obtain a weighted sampling scheme for the uniform distribution 
across pairs of phylogenetic trees on the same labeling set. For example, assume 
we wanted to approximate the expectation of a function / on uniformly sampled 
pairs of labeled trees, but which is constant on pairs of trees that make the same 
tanglegram (such as SPR distance). Then 


Ti,T2 


/(Ti, T2)P(ri, Ta) = V /(Ti, r2)P(Ti, r2|P)P(P) 


Y 


where if f{Ti,T 2 ) = f{T 2 ,Ti) for all Ti,T 2 then the right hand sum can be over 
unordered tanglegrams Y, and otherwise it is over ordered tanglegrams Y. Here 
P(Ti, T 2 IP) is simply the indicator function expressing if Ti and T 2 make F, divided 
by the number of pairs of labeled trees making Y as enumerated in Proposition [21] 
Rather than sampling pairs of trees uniformly and calculating an empirical ex¬ 
pectation as on the left side, we can get a lower variance estimator by sampling 
tanglegrams uniformly and weighting them as on the right hand side. Such a means 
of sampling uniformly from tanglegrams in the rooted binary ordered case is given 


22 . 


3. VARIANTS AND SPECIAL CASES 


3.1. Multiple trees. The definition of a tanglegram on two trees can be general¬ 
ized to a version on multiple trees. 

Definition 22. Given trees Ti,...,T„ with the same number of leaves, a multi- 
tanglegram on this set of trees is given by a pair of tuples ((T),... ,T„), 
in which (j)ij : L{Ti) —>■ L(Tj) are bijections satisfying: 

1. (pii = 1 for all i; 

2. cj)ji = for all i, j; 

3- f*ik — 4^jk ^ for all i, j, k. 

We can also generalize the definition of isomorphism to multi-tanglegrams on n 
trees. 
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Definition 23. Two multi-tanglegrams Y = ((Ti,... ,r„), o-nd Y' = 

„) on the same list of trees are isomorphic if there exist 
automorphisms {gi : Ti —>■ and {hi : Ti —>■ satisfying hj o cfij = 

(t>ij°9i fori,j = 

It is clear that the bijections (pij are completely determined by the n — 1 bi- 
jections {4>ii}i=2,...,n, since (pij = (fij offfi ■ With this observation, we can rephrase 
the definition of isomorphism above, which we will state as a proposition: 


Proposition 24. Using the notation above, multi-tanglegrams Yi and Y 2 are iso¬ 
morphic if and only if there exist automorphisms gi G A{Ti), i = 1,... ,n satisfying 
i’ll — 9i° 4^11 ° 9i ■ 


Alternatively, the automorphisms (fij are completely determined by a sequence 
012, 023, • • ■, 0 fe-i fc, and thus multi-tanglegrams are called tangled chains by 


22 


3.2. More general classes of graphs. Another direction of generalization in¬ 
volves considering more general classes of graphs. For example, the tanglegram 
layout problem has been studied for rooted phylogenetic networks 35 . Given a 


natural number n, define an n-leaved graph as a graph U along with n distinguished 
vertices L{U) called leaves. 


Definition 25. Given a natural number n, define a generalized n-tanglegram as 
a triple {U, 0, V), where U and V are a pair of n-leaved graphs and f) is a bijection 
between L{U) and L{V). 


Equivalent statements to those above can also hold in this more general setting. 
If we require that n-leaved graph automorphisms preserve the leaf set L{U), we 
can again define the leaf automorphism group A{U) to be the automorphism group 
of U restricted to L{U). If the graphs are such that any graph automorphism is 
determined by its action on the leaf set, then generalized tanglegrams on a given 
pair of n-leaved graphs U and V are in one-to-one correspondence with double 
cosets A{V)wA{U) in 6 „. 


3.3. Partitions. Another line of inquiry in computational evolutionary biology 
concerns species delimitation, which can naturally be phrased in terms of inference 
of a partition of labeled objects. In a manner analogous to phylogenetic trees, 
researchers use MCMC to explore the posterior on such partitions 13 , and com¬ 


parison of the results can be performed using distances between the partitions 12 


Similar considerations hold for random walks and these distances as described in 
the introduction for trees. These partitions can also be thought of as a certain type 
of leaf-labeled tree of height two, thus pairs of partitions on the same underlying 
set also give a type of tanglegram. 

All of the above conclusions hold for such partition tanglegrams as well. The 
automorphisms of a partition are a special case of Theorem For example, the 
partition 123 | 456 | 78 has automorphism group (63 I 62 ) x 62 . 


4. Enumeration 


Using a computer algebra package such as GAP4 36 which is able to enumerate 
double cosets, and a package such as Sage 37 which can obtain symmetry groups of 
graphs, one can apply Propositionto directly enumerate any type of tanglegram 
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Figure 5. Counts of various types of tanglegrams. 


on a given pair of trees. We have provided code to enumerate and work with 
tanglegrams at https://github.com/matsengrp/tangle 

For the case of binary ordered rooted tanglegrams, an elegant formula for the 
total number of tanglegrams on n leaves has recently been found 22 . One can 


use this formula, along with the number of tanglegrams on pairs of isomorphic trees, 
to compute the number of unordered tanglegrams as follows. 

An unordered tanglegram is represented twice in the list of ordered tanglegrams 
on n leaves if the two trees are non-isomorphic, or if the trees are isomorphic and 
the coset is different when the representative is inverted as in Figure For n 
leaves, we let be the number of unordered tanglegrams, and then let be the 
number of ordered tanglegrams and the number of unordered tanglegrams on 
isomorphic pairs of trees. To get s„, we start with and subtract off half the 
number of ordered tanglegrams on non-isomorphic trees for the first case, and then 
subtract off for the second. Simplifying — (t„ — t^°)/2 — 

we get s„ = /2 -I- for any n> 3. 

Such direct enumeration of various types of tanglegrams (Figure Table 
suggests that their number grows super-exponentially. In fact, that the number of 

n-3^ 


(binary ordered rooted) tanglegrams is 0(n!4"n as shown by [22| . 

There are thus many fewer such tanglegrams than there are pairs of leaf-labeled 


trees. Indeed, a simplification of the argument establishing Corollary 8 of 22 
shows that the ratio of the number of ordered pairs of leaf-labeled rooted trees 
to the number of binary ordered rooted tanglegrams is asymptotically a constant 
times the order of the symmetric group: 

((2 r —3)!!)^ n! 


Intuitively, although the action of the symmetric group is not always free, “for 
most cases it is close” to free. This may suggest that for n leaves, the ratio of 
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Table 1. Enumeration of various types of binary tanglegrams. 
These counts have been validated “from below” by checking for 
graph isomorphisms between exemplars for n < 6 in the rooted 
case, and n < 7 in the unrooted case. 


leaves rooted ord. rooted unord. unrooted ord. unrooted unord. 


1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

3 

2 

2 

1 

1 

4 

13 

10 

2 

2 

5 

114 

69 

4 

4 

6 

1509 

807 

31 

22 

7 

25595 

13048 

243 

145 

8 

535753 

269221 

3532 

1875 

9 

13305590 

6660455 

62810 

31929 


the number of ordered pairs of leaf-labeled unrooted trees to the number of binary 
ordered unrooted tanglegrams is also of order n\. 


5. Discussion 


Tanglegrams have been an object of study since before DNA sequences were 

So far they have 


widely available for the reconstruction of phylogenetic trees 38 


been studied before in the context of co-evolutionary analyses, classically that be¬ 


tween a host and a parasite, a subject of continuing interest 39 40 . As such, there 


has been extensive work on the case in which two rooted trees are distinguished 
between one another, as when one tree represents hosts and one parasites, which we 
call the ordered rooted case. Here we have broadened the definition of tanglegrams 
by considering a broader class of underlying graphs, including unordered and/or 
unrooted tanglegrams. 

In this form, tanglegrams formalize statements concerning pairs of phylogenetic 
trees on the same leaf set that do not directly make reference to the labels them¬ 
selves. Symmetric tanglegrams also do not make reference to the order of the trees. 
We observe that many problems in phylogenetic combinatorics “factor” through a 
problem on tanglegrams. As such, we believe tanglegrams to be a worthwhile ob¬ 
ject of study in phylogenetic combinatorics, and note that they have already been 
crucial in an analysis of the geometry of the subtree-prune-regraft graph 


11 


These generalized notions of tanglegrams, which are equivalent to the collection 
of double cosets formed by the automorphism groups of the two trees, invite further 
investigation by combinatorialists. An elegant formula for the number of binary 
ordered rooted tanglegrams has recently been found 22 , as well as for the multi- 


tanglegram case. Here we provide the first several terms of the analogous sequence 
for unordered and/or unrooted tanglegrams; Ira Gessel has used the theory of 
species to develop means to enumerating unordered tanglegrams, which will be 
described in a forthcoming paper 41 . It would be helpful to have a means of 


efficiently sampling other classes of tanglegrams according to familiar distributions 
on labeled phylogenetic trees, perhaps building on the method of sampling binary 


ordered rooted tanglegrams uniformly at random in 22 . 
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Appendix 

Proof of Lemma Assume that two isomorphisms f,g:T —>■5' induce the same 
bijection of L(T) to L{S). Let If be a path between any two leaves of T. Then /(II) 
and g(n) are paths are paths between the same leaves of S and thus are identical 
by definition of a tree. Now, we just need to prove that every internal vertex x lies 
on a path joining two leaves. Since x is internal, it belongs to at least two edges 
(x^y) and {x,y'). Consider a sequence of vertices obtained by following edges in 
the graph without backtracking, i.e. such that {w,z) never follows {z,w), starting 
with {x,y). Because the tree is finite and contains no loops by definition, this path 
will terminate at a leaf. The same argument applied to {x,y') finds another leaf 
such that the path between these two leaves contains x. □ 
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